Parallel Translations As Sense Discriminators

نویسنده

  • Nancy M. Ide
چکیده

This article reports the results of a p r e hmlna ry analysis of translation equivalents in four languages from different language famdles, extracted from an on-hne parallel corpus of George Orwell's Nmeteen Eighty-Four The goal of the study is to determine the degree to which translatmn equivalents for different meamngs of a polysemous word In Enghsh are lexlcahzed differently across a variety of languages, and to detelmme whether this information can be used to structure or create a set of sense distinctions useful in natural language processing apphcatmns A coherence Index is computed that measures the tendency for different senses o1 the same English word to be lexlcahzed differently, and flora this data a clustering algorithm is used to create sense hierat chles

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Memory-based Learning of Word Translation

A basic task in machine translation is to choose the right translation for source words with several possible translations in the target language. In this paper we treat word translation as a word sense disambiguation problem and train memory-based classifiers on words with alternative translations. The training data was automatically labeled with the corresponding translations by word-aligning...

متن کامل

Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation

Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benchmark data set for cross-lingual word sense disambiguation. The data set was created for a lexical sample of 25 English nouns, for which translations were retrieved in 5 languages, namely Dutch, German, French, Italian...

متن کامل

Cross-Lingual Word Sense Disambiguation

Word Sense Disambiguation using Cross-Lingual approach has been used successfully for languages like Farsi and Hindi. However, a comparable corpus in the form of Wikipedia articles available in English and Hindi has been used for such a task. This motivated us to further the approach and test the results when a parallel corpus is used. In this project, we specifically wanted to observe if the a...

متن کامل

Using Parallel Corpora for Word Sense Disambiguation

Word Sense Disambiguation (WSD) is the Natural Language Processing (NLP) task that consists in selecting the correct sense of a polysemous word in a given context. Most state-of-the-art WSD systems are supervised classifiers that are trained on manually sense-tagged corpora, which are very time-consuming and expensive to build. In order to overcome this acquisition bottleneck (sense-tagged corp...

متن کامل

Cross-lingual WSD for Translation Extraction from Comparable Corpora

We propose a data-driven approach to enhance translation extraction from comparable corpora. Instead of resorting to an external dictionary, we translate source vector features by using a cross-lingual Word Sense Disambiguation method. The candidate senses for a feature correspond to sense clusters of its translations in a parallel corpus and the context used for disambiguation consists of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999